强化学习的标准制定缺乏指定禁止和禁止行为的实用方式。最常见的是,从业者通过手动工程来指定行为规范的任务,这是一个需要几个迭代的反向直观的过程,并且易于奖励代理人。在这项工作中,我们认为,几乎完全用于安全RL的受限制的RL,也有可能大大减少应用加强学习项目中奖励规范所花费的工作量。为此,我们建议在CMDP框架中指定行为偏好,并使用拉格朗日方法,该方法寻求解决代理程序的策略和拉格朗日乘法器之间的最小问题,以自动称量每个行为约束。具体而言,我们研究了如何调整CMDP,以便解决基于目标的任务,同时遵守一组行为约束,并提出对Sac-Lagrangian算法的修改以处理若干约束的具有挑战性的情况。我们对这一框架进行了一系列持续控制任务,该任务与用于视频游戏中NPC设计的加固学习应用相关。
translated by 谷歌翻译
我们在具有挑战性的3D视频游戏中处理规划和导航,其中包含使用特殊操作的代理商的断开区域的地图。在此设置中,经典符号规划者不适用或难以适应。我们介绍了一种混合技术,结合了培训的钢筋学习训练的低级政策和基于图的高级古典规划器。除了提供人类可解释的路径之外,该方法还提高了看不见地图中的端到端方法的泛化性能,在那里它在一点上通过复发端到端剂的成功率达到20%的绝对增加要点导航任务,但看不见的大型码1km x 1km。在深入的实验研究中,我们量化了巨大环境中端到端深度RL方法的局限性,我们还介绍了一个新的基准,即很快被释放的环境,可以生成用于导航任务的复杂程序3D地图。
translated by 谷歌翻译
Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research. We introduce a framework that makes this easier by providing a simple interface for tracking realtime energy consumption and carbon emissions, as well as generating standardized online appendices. Utilizing this framework, we create a leaderboard for energy efficient reinforcement learning algorithms to incentivize responsible research in this area as an example for other areas of machine learning. Finally, based on case studies using our framework, we propose strategies for mitigation of carbon emissions and reduction of energy consumption. By making accounting easier, we hope to further the sustainable development of machine learning experiments and spur more research into energy efficient algorithms.
translated by 谷歌翻译
Understanding the relationship between structure and sentiment is essential in highlighting future operations with online social networks. More specifically, within popular conversation on Twitter. This paper provides a development on the relationship between the two variables: structure, defined as the composition of a directed network, and sentiment, a quantified value of the positive/negative connotations of a conversation. We highlight thread sentiment to be inversely proportional to the strength and connectivity of a network. The second portion of this paper highlights differences in query types, specifically how the aforementioned behavior differs within four key query types. This paper focuses on topical, event-based, geographic, and individual queries as orientations which have differing behavior. Using cross-query analysis, we see that the relationship between structure and sentiment, though still inversely proportional, differs greatly across query types. We find this relationship to be the most clear within the individual queries and the least prevalent within the event-based queries. This paper provides a sociological progression in our understanding of opinion and networks, while providing a methodological advancement for future studies on similar subjects.
translated by 谷歌翻译
We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.
translated by 谷歌翻译
Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schools and health clinics. Thanks to the efforts of thousands of digital humanitarians, online mapping repositories such as OpenStreetMap contain millions of records on buildings and other structures, delineating both their location and often their use. Unfortunately much of this data is locked in complex, unstructured text rendering it seemingly unsuitable for classifying schools or clinics. We apply a scalable, unsupervised learning method to unlabeled OpenStreetMap building data to extract the location of schools and health clinics in ten countries in Africa. We find the topic modeling approach greatly improves performance versus reliance on structured keys alone. We validate our results by comparing schools and clinics identified by our OSM method versus those identified by the WHO, and describe OSM coverage gaps more broadly.
translated by 谷歌翻译
We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \mapsto \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 0$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^k \frac {f^{(i)}(x_0)} {i!} (x - x_0)^i \in I (x - x_0)^{k+1}$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known elementary functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. Applied to machine learning, this leads to architecture-specific optimizers for training deep networks that converge from any starting point, without hyperparameter tuning. Our experiments show that for some optimization problems, these hyperparameter-free optimizers outperform tuned versions of gradient descent, Adam, and AdaGrad. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.
translated by 谷歌翻译
A typical product or place often has hundreds of reviews, and summarization of these texts is an important and challenging problem. Recent progress on abstractive summarization in domains such as news has been driven by supervised systems trained on hundreds of thousands of news articles paired with human-written summaries. However for opinion texts, such large scale datasets are rarely available. Unsupervised methods, self-training, and few-shot learning approaches bridge that gap. In this work, we present a novel self-training approach, OpineSum, for abstractive opinion summarization. The summaries in this approach are built using a novel application of textual entailment and capture the consensus of opinions across the various reviews for an item. This method can be used to obtain silver-standard summaries on a large scale and train both unsupervised and few-shot abstractive summarization systems. OpineSum achieves state-of-the-art performance in both settings.
translated by 谷歌翻译
The applicability of computational models to the biological world is an active topic of debate. We argue that a useful path forward results from abandoning hard boundaries between categories and adopting an observer-dependent, pragmatic view. Such a view dissolves the contingent dichotomies driven by human cognitive biases (e.g., tendency to oversimplify) and prior technological limitations in favor of a more continuous, gradualist view necessitated by the study of evolution, developmental biology, and intelligent machines. Efforts to re-shape living systems for biomedical or bioengineering purposes require prediction and control of their function at multiple scales. This is challenging for many reasons, one of which is that living systems perform multiple functions in the same place at the same time. We refer to this as "polycomputing" - the ability of the same substrate to simultaneously compute different things. This ability is an important way in which living things are a kind of computer, but not the familiar, linear, deterministic kind; rather, living things are computers in the broad sense of computational materials as reported in the rapidly-growing physical computing literature. We argue that an observer-centered framework for the computations performed by evolved and designed systems will improve the understanding of meso-scale events, as it has already done at quantum and relativistic scales. Here, we review examples of biological and technological polycomputing, and develop the idea that overloading of different functions on the same hardware is an important design principle that helps understand and build both evolved and designed systems. Learning to hack existing polycomputing substrates, as well as evolve and design new ones, will have massive impacts on regenerative medicine, robotics, and computer engineering.
translated by 谷歌翻译
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. Several recent efforts attempt to address this by devising models that automatically detect factual inconsistencies in machine generated summaries. However, they focus exclusively on English, a language with abundant resources. In this work, we leverage factual consistency evaluation models to improve multilingual summarization. We explore two intuitive approaches to mitigate hallucinations based on the signal provided by a multilingual NLI model, namely data filtering and controlled generation. Experimental results in the 45 languages from the XLSum dataset show gains over strong baselines in both automatic and human evaluation.
translated by 谷歌翻译